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I. 


INTRODUCTION 


This  report  delineates  the  research  that  was  performed  during  the 
period  of  July  1,  1979  through  June  30,  1980.  The  organization  of  the 
report  is  as  follows: 

A  listing  of  the  specific  problems  that  were  studied  during  this 
period  appears  in  Section  II.  Following  this,  in  Section  III,  is  a  brief 
overview  of  those  significant  results  that  were  developed  during  this 
research  period.  Next,  in  Section  IV,  all  publications  and  conference 
presentations  relating  to  this  research  are  provided.  Also,  a  listing  of 
all  personnel  that  were  associated  with  this  research  project  is  given  in 
this  section.  Finally,  in  Section  V  both  the  present  status  of  the  work, 
as  well  as  the  future  direction  of  the  continuing  research,  is  outlined. 

II.  LIST  OF  RESEARCH  OBJECTIVES 


The  research  that  was  conducted  here  focused  on  the  following 
specific  problems: 

(a)  Development  of  error-correcting  codes  that  are  effective  against 
unidirectional  errors.  Also,  the  design  of  self-checking 
decoders  for  these  codes. 

(b)  The  development  of  techniques  that  facilitate  the  use  of  ROMs  as 
basic  building  blocks. 

(c)  The  study  of  various  aspects  of  bridging  faults  that  appear  in 
integrated  circuits. 

(d)  The  development  of  techniques  to  diagnose  faults  that  occur  in 
closed  flow  networks. 
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(e)  The  development  of  a  uniform  representation  of  memory  processor 
interconnection  networks,  as  well  as  the  study  of  its 


applications. 


III.  SUMMARY  OF  RESEARCH  RESULTS 


This  section  summarizes  the  important  results  that  were  developed  in 
the  various  areas  just  listed.  With  the  only  exception  of  (b)  all  work 
regarding  the  other  objectives  has  been  documented  in  papers  that  have 
appeared  or  will  appear  shortly.  Consequently,  for  a  more  technical 
discussion  of  the  results,  the  reader  may  refer  to  the  appropriate 
papers,  listed  in  Section  IV  and/or  cited  throughout  this  section. 

(a)  Error  Correcting  Codes  and  Decoders  for  Unidirectional  Errors 

As  it  has  been  observed  by  the  designers  of  various  fault-tolerant 
computers,  the  anticipated  errors  that  can  occur  in  computers  can  be 
quite  different  from  those  appearing  in  communication  systems. 

A  good  example  of  this  is  the  class  of  unidirectional  errors  [2]  that 
occur  in  LSI  circuits  such  as  ROMs,  PLAs.  As  it  has  been  observed  by  the 
designers  of  the  ESS  computers,  failures  that  occur  in  power  supply, 
decoders,  word  lines  and  similar  type  failures  cause  unidirectional 
errors. 

There  is  a  fundamental  difference  between  these  unidirectional  errors 
and  the  so-called  symmetric/ asymmetric  errors  that  occur  in  communication 
channels,  as  seen  below: 

Let  X  be  the  erroneous  word  which  may  represent  received  data  from  a 
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noisy  channel,  memory,  etc. 


If  the  likely  errors  are  symmetric,  then  X  may  contain  both  1  to  0 
and  0  to  1  errors. 

If  the  likely  errors  are  asymmetric,  then  X  may  contain  only  one  type 
of  error,  either  a  1  to  0  or  a  0  to  1  type,  and  this  is  fixed  and  known 
apriori. 

If  the  likely  errors  are  unidirectional,  then  X  may  contain  either  1 
to  0  or  0  to  1  errors;  but  the  type  errors  are  not  fixed,  and  therefore 
are  not  known  apriori.  Thus,  both  1  to  0  and  0  to  1  type  errors  can 
occur  (although  not  in  the  same  codeword). 

For  LSI  circuits,  there  is  the  likelihood  of  unidirectional  errors 


occuring,  in  addition  to  the  usual  symmetric  errors.  However,  no  codes 
had  been  available  which  were  effective  against  this  combination  of 
symmetric  and  unidirectional  errors.  A  major  contribution  of  last  year's 
research,  therefore,  was  to  develop  precisely  such  codes. 

A  class  of  (systematic)  codes  have  been  developed  that  can  correct 
symmetric  errors  and  also  detect  all  unidirectional  errors.  These  codes 
also  provide  a  varying  degree  of  error  control  capability,  well-suited 
for  different  applications.  Oecoding  algorithms  for  these  codes  have 
been  developed,  as  well.  Self-checking  implementations  of  these  decoders 
has  also  been  proposed.  These  results  have  recently  appeared  in  a  paper 
published  in  Transactions  on  Computers. 

In  addition,  this  paper  develops  certain  basic  results  regarding  the 
algebraic  structure  of  symmetric  error  correcting  -  unidirectional  error 


detecting  codes, 
developed. 


Decomposition  of  ROMs 


The  size  of  ROM  is  an  exponential  function  of  the  number  of  its 
inputs.  Thus,  the  number  of  inputs  is  strictly  limited  by  the  maximum 
number  of  bits  that  can  be  placed  on  a  chip.  Hence,  one  problem  that  has 
much  practical  significance  is  decomposition  of  a  given  m-input/n-output 
function,  so  that  it  can  be  realized  by  a  p-input/q-output  ROM,  where 


A  rather  simplistic  solution  to  this  problem  uses  2m_P  ROMs  along 
with  a  decoder  of  (m— p )  inputs  and  2m~*)  outputs;  this  scheme  is 
practical  only  where  (m— p )  is  very  small. 

An  alternate  solution  to  this  problem  is  to  realize  the  function  by 
using  two  levels  of  ROMs.  This  technique  can  be  much  more  efficient  than 
the  above  since  the  number  of  ROMs  grows  only  linearly  with  m  and  p. 

An  example  considers  a  16-input/4-output  function,  f.  To  implement 
this  using  a  single  ROM,  a  256k  bit  ROM  is  required.  However,  currently 
available  ROMs  are  limited  in  size  to  64k  bits.  There  are  two  possible 
solutions:  one  is  to  use  a  decoder  and  4  of  these  64k  bit  ROMs.  On  the 
other  hand,  if  f  could  be  decomposed  in  terms  of  fp  fg  and  f^, 
some  8-input/4-output  functions  where  f  =  ^(f^^K  then  one  can 
realize  f^  using  3,  lk  ROMs.  This  will  be  a  considerable  savings  wh^n^ 
compared  to  both  the  single  chip  and  the  decoder  implementations. 

However,  such  a  decomposition  may  or  may  not  be  possible  for  a  given 
function.  Thus,  the  problem  is  to  find  such  a  decomposition  if  it 
exists.  We  have  formulated  a  novel  technique  for  such  functional 
decomposition,  as  described  below: 

A  p-input/q-output  function  can  be  expressed  as  a  function  over  a 
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Galois  field,  GF(2t),  where  t  ■  gcd(p,q).  Thus,  the  number  of  Galois 
functions  to  be  realized  is  k  ■  q/t,  a  much  smaller  number  compared  to  q, 
the  number  of  binary  functions.  For  example,  in  the  earlier  described 
case,  we  have  k  ■  1,  and  thus  f  is  a  single  function  over  GF(24).  Each 
of  these  k  functions  may  be  decomposed  by  using  a  matrix  technique  now 
developed.  This  matrix  technique  can  be  easily  implemented  by  computer, 
and  is  based  on  the  generalized  Reed-Muller  expansion  of  Galois  functions. 

(c)  Bridging  faults  in  integrated  circuits 

The  traditional  stuck-at  fault  model  has  become  inadequate  with  the 
increasing  complexity  of  LSI  circuits.  Therefore,  recently  growing 
attention  has  been  focused  on  a  different  class  of  faults  known  as  short 
circuit  (bridging)  faults.  These  bridging  faults  are  caused  by  a  variety 
of  failures. 

At  the  chip  level,  bridging  faults  are  caused  by  failures  of 
insulation  that  may  occur  between  adjacent  layers  of  metallization,  or 
they  may  be  due  to  a  short  that  may  occur  between  conductors  in  the  same 
layer.  This  short  could  be  the  result  of  improper  masking  and/or  etching. 

Bridging  faults  also  occur  at  the  input/output  pins  of  a  chip,  and  at 
the  links  between  circuit  boards;  this  is  due  to  defects  in  bonding  and 
soldering. 

The  most  common  effect  of  a  bridging  fault  that  occurs  between  two 
lines  is  that  an  AND/OR  function  may  result  between  the  faulty  lines. 
That  particular  function  produced  is  dependent  upon  the  technology  used. 

Testing  of  these  faults  can,  indeed,  be  a  formidable  problem.  Recent 
research  conducted  here  has  proven  that  an  irredundant  circuit  can  be 
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plagued  with  large  numbers  of  undetectable  faults.  The  occurrence  of 
such  an  undetectable  fault  can  invalidate  a  test  set  (which  was  designed 
to  detect  detectable  faults). 

A  serious  practical  problem  arises,  therefore,  of  accepting  a  faulty 
circuit  as  fault-free.  Consequently,  we  have  developed  a  procedure  that 
generates  test  sets  for  a  restricted  class  of  networks.  These  test  sets 
do  not  have  the  shortcoming  of  becoming  invalid  in  the  presence  of 
undetectable  faults.  These  findings  are  published  in  [3]. 

We  are  now  carrying  out  further  research  to  generalize  the  test 
generation  procedure  to  more  general  networks,  such  as  PLA's. 

(d)  Fault-diagnosis  of  closed-flow  networks 

Flow  networks  represent  a  model  for  computer  communication  networks, 
multiple  resource  computer  systems,  transportation  networks,  etc.  A 
closed  (open)  network  is  defined  as  a  flow  network  without  (with) 
external  inputs/outputs.  An  open  network  can  be  restructured  as  a  closed 
network  with  additional  nodes. 

We  have  developed  a  graph-theoretic  technique  that  diagnoses  link 
faults  in  closed  flow  networks.  Also,  we  have  formulated  a  procedure  to 
diagnose  all  single  link  (edge)  faults  in  an  n-node  network  by  observing 
the  flow  through  (n-1)  links.  This  procedure  is  based  on  a  flow 
causality  relationship  which  completely  characterizes  the  closed  flow 
networks.  These  results  will  be  published  shortly  in  [4]. 

Further  work  is  also  being  carried  out  that  extends  the  results  to 


include  node  faults. 


(e)  Characterization  and  Fault-Diagnosis  of  memory  processor 
interconnection  networks 


Single  and  mult-stage  interconnection  networks  are  used  for  memory 
processor  interconnection  in  parallel  processor  systems.  These  networks 
communicate  data  and  results  between  different  processors  and  memories. 

A  general  mathematical  framework  has  been  developed  which  provides  a 
uniform  characterization  of  various  interconnection  networks.  Using  this 
formulation,  it  has  now  been  established  that  various  networks  in 
existence  are,  indeed,  functionally  equivalent.  Further  new  insights 
were  obtained  regarding  the  class  of  permutations  admissable  by  various 
networks;  these  results  appear  in  [5]. 

Further  research  on  memory  processor  networks  is  currently  being 
carried  out,  chiefly  in  the  areas  of  fault-diagnosis  and 
fault-tolerance.  Since  these  interconnection  networks  form  such  a 
central  part  of  the  system,  their  reliable  operation  is  of  crucial 
importance.  Our  objective  here  is  to  develop  a  general  procedure  in 
which  effective  test  sets  for  these  networks  are  derived  that  can  detect 
and  locate  faults.  Techniques  that  provide  graceful  degradation  of  these 
networks  are  also  being  studied.  Preliminary  results  indicate  that  some 
of  these  networks  can  operate  successfully  at  half  the  original  speed  in 
the  presence  of  a  single  fault. 


IV. 
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V. 


RESEARCH  IN  PROGRESS  AND  ITS  FUTURE  DIRECTION 


This  section  provides  the  perspective  of  research  to  be  carried  out. 


(a)  Efficient  codes  for  unidirectional  errors 


Recently  we  have  found  another  class  of  codes  that  are  different  from 
those  we  have  described  in  the  last  section.  These  codes  are  promising 
because  they  have  much  higher  efficiency.  There  is  also  the  likelihood 
that  this  latter  class  of  codes  may  possess  optimum  efficiency. 

Also,  we  have  been  investigating  a  class  of  coset  codes  which  are 

especially  effective  against  unidirectional  codes.  This  class  of  codes 
appear  to  have  a  simple  encoding  and  ^coding  algorithm. 

(b)  Develop  testable  design  of  programmable  logic  arrays 

Like  ROM's,  programmable  logic  arrays  (PLA's)  possess  several 

attractive  features.  As  a  result,  PLA's  are  finding  increasing  use.  The 
testing  of  these  PLAs,  though,  is  certainly  a  complex  problem  because  of: 
(i)  the  wide  diversity  of  faults  that  can  occur  in  these  circuits, 
and 

(ii)  PLAs  are  prone  to  undetectable  faults.  Consequently  we  are 
developing  a  technique  to  incorporate  certain  testability 
aspects  directly  into  the  design,  itself.  The  resulting  PLA  is 
expected  to  have  a  function  independent  test  set. 

It  should  be  pointed  out  that  our  research  in  this  area  differs  from 

other  approaches  in  that  we  consider  all  of  the  three  types  of  faults 
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that  can  occur  in  PLA's  -  bridging  faults,  crosspoint  faults  and  stuck-at 
faults. 


(c)  Fault-diagnosis  of  memory  processor  interconnection  networks 

Procedures  are  now  being  developed  that  detect  and  locate  faults  in 
various  interconnection  networks.  Also  being  studied  are  techniques  that 
reconfigure  the  faulty  network  so  that  the  interconnection  capability  of 
the  fault-free  network  can  be  preserved.  Specifically,  given  a  network 
with  faults,  we  are  investigating  the  following:  the  number  of  passes 
that  are  required  through  the  faulty  network  to  realize  all  the 
permutations  that  are  admissable  by  the  fault  free  network.  Thus  far,  we 
have  been  able  to  show  that  with  the  presence  of  a  single  fault  in 
certain  networks,  two  passes  are  sufficient  to  realize  all  of  the  desired 
permutations. 

VI.  CONCLUSION 


This  report  outlines  the  progress  of  our  ongoing  research  supported 
by  the  AFOSR.  Major  efforts  are  currently  being  devoted  to  the  areas  of 
testing  and  testable  design  of  PLAs.  Significant  work  will  also  be 
directed  toward  the  development  of  new  codes  which  can  prove  attractive 
in  the  design  of  fault-tolerant  LSI  circuits.  Finally,  the  research  that 
has  already  been  initiated  on  the  fault-tolerant  aspects  of 
interconnection  networks  and  computer  communication  networks  is  expected 
to  be  expanded  upon. 
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